Which bus line among the AC Transit 19, 20, and 51A would benefit most from real-time information displays at some of their bus stops?
Real-time information displays inform transit riders about when the next bus is arriving at their stop. These displays provide convenience for all transit riders, but they are especially useful for those who lack smartphones or access to the Internet and are therefore unable to access bus arrival times by any other means, especially the elderly (Shrestha et al. 2017). Real-time information at bus stops provide the most benefit for stops serviced by routes that have more unreliable schedules and infrequent headways as riders at these stops likely have longer or more unpredictable wait times. Installing real-time information can also enhance ridership on a given route. A New York City study found that real-time information has also been associated with increases in ridership especially on long and heavily-used routes (Brakewood et al. 2015).
In this project we will analyze 3 bus routes operated by AC Transit: 19, 20 and 51A. These routes run through the cities of Oakland and Alameda, California. These routes also serve multiple heavy rail (Bay Area Rapid Transit) stations, operate through Downtown Oakland and serve neighborhoods of differing profiles. They also operate at various frequencies with the 51A being the most frequent (every 10-15 minutes) and the 19 being the least frequent (every 30 minutes or more), thereby making it a more representative sample of bus routes operated by AC Transit. In the following analysis we look at demographic data along census tracts near bus stops as well as delay information for each route.
References:
Brakewood, C., Macfarlane, G.S. & Watkins, K. (2015). The Impact of Real-time Information on Bus Ridership in New York City. Transportation Research Part C: Emerging Technologies, 53, 59-75.
Shrestha, B.P., Millonig, A., Hounsell, N.B. & McDonald, M. (2017). Review of Public Transport Needs of Older People in European Context. Journal of Population Ageing, 10, 343-361.
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from matplotlib.lines import Line2D
import matplotlib.patches as mpatches
from matplotlib.patches import Patch
from matplotlib.lines import Line2D
import contextily as cx
import plotly.express as px
import folium
import networkx as nx
import osmnx as ox
We begin with showing the three bus lines (AC Transit 19, 20, and 51A) that we analyze in this project in the context of Oakland and Alameda.
bus_routes = gpd.read_file("data/bus-routes/filtered_bus_routes.shp")
The CRS of the bus routes shapefile is 4326, so we explicitly made it a geodataframe with that reference system.
bus_routes = gpd.GeoDataFrame(bus_routes,
geometry="geometry",
crs="EPSG:4326")
We then convert it to match the CRS of the contextily basemap.
bus_routes_wm = bus_routes.to_crs(epsg=3857)
ax_br = bus_routes_wm.plot(column="PUB_RTE", legend=True, figsize=(10,12), linewidth=2, cmap="viridis_r",
legend_kwds={
'loc': 'upper right',
'bbox_to_anchor':(1,1),
'title': 'Line',
'title_fontsize': 16,
'fontsize': 14
})
cx.add_basemap(ax_br, source=cx.providers.CartoDB.Positron)
plt.title(label="AC Transit Bus Lines",
fontsize=16,
color="black");
We believed that including these variables in our analysis would allow us to understand the demographics of the census tracts that the 19, 20, and 51A lines run along. We specifically decided to create maps pertaining to minorities within these tracts, such as percentage of Black and Asian populations and percentage of low-income households, as these were populations that were more likely to use public transit or depend on transit for their travels. We also mapped out the surrounding tracts by percentage of workers who take public transportation as we perceived this to be a vital trip purpose and one that could benefit from greater travel time predictability in their trip.
To understand the surrounding area, we downloaded Census data on demographics, housing, and transportation characteristics from Social Explorer and used geopandas's plot function to make choropleth maps.
census_tracts_data = pd.read_csv("data/census/census_tracts_data.csv")
census_tracts_data = census_tracts_data.dropna(axis=1,how="all")
We first convert the Social Explorer column names to more comprehensible ones.
columns_to_keep = ['Geo_FIPS',
'SE_A03001_001',
'SE_A03001_002',
'SE_A03001_003',
'SE_A03001_005',
'SE_A01001_011',
'SE_A01001_012',
'SE_A01001_013',
'SE_A14001_001',
'SE_A14001_002',
'SE_A14001_003',
'SE_A14001_004',
'SE_A14001_005',
'SE_A14001_006',
'SE_A14001_007',
'SE_A14001_008',
'SE_A09005_001',
'SE_A09005_003']
census_tracts_data = census_tracts_data[columns_to_keep]
census_tracts_data.columns = ['FIPS',
'Total Population',
'White Alone',
'Black or African American Alone',
'Asian Alone',
'65 to 74 years',
'75 to 84 years',
'85 years and above',
'Households',
'Less than $10,000',
'$10,000 to $14,999',
'$15,000 to $19,999',
'$20,000 to $24,999',
'$25,000 to 29,999',
'$30,000 to $34,999',
'$35,000 to $39,999',
'Workers 16 years and over',
'Public Transportation (Includes Taxicab)']
geom_census_data = pd.read_csv('data/census/bus_route_tracts.csv')
geom_census_data = geom_census_data.rename(columns = {"GEOID":"FIPS"})
geom_census_data = geom_census_data[["FIPS", "geometry"]]
We have two datasets, one with the Social Explorer data, and one that has the tract geometries. We merge these on the FIPS column.
joined_census_data = census_tracts_data.merge(right=geom_census_data, on="FIPS")
As there is now a geometry column, we convert this dataframe to a geodataframe.
joined_census_data["geometry"] = gpd.GeoSeries.from_wkt(joined_census_data["geometry"])
joined_census_data = gpd.GeoDataFrame(joined_census_data, geometry="geometry")
We create variables for percent Black, percent Asian, percent low income, and percent taking transit using the Social Explorer data, as we believe this is easier to understand than actual counts through choropleth maps. We also use the contextily library for their basemaps -- here we've used CartoDB's Positron basemap. We also modify the alpha value so that the tracts are more transparent.
joined_census_data['PCT_Black'] = joined_census_data['Black or African American Alone']/joined_census_data['Total Population']*100
bins_bl = [10, 20, 30, 40, 50]
bl_1 = mpatches.Patch(color='#fef0d9', label='0 to 10')
bl_2 = mpatches.Patch(color='#fdcc8a', label='10 to 20')
bl_3 = mpatches.Patch(color='#fc8d59', label='20 to 30')
bl_4 = mpatches.Patch(color='#e34a33', label='30 to 40')
bl_5 = mpatches.Patch(color='#b30000', label='40 to 50')
ax_bl = joined_census_data.plot(figsize=(12,10),
column='PCT_Black',
legend=True,
cmap='OrRd',
scheme='UserDefined',
classification_kwds={'bins': bins_bl},
linewidth=0.2,
edgecolor='grey',
alpha=0.8,
legend_kwds={
'loc': 'upper right',
'bbox_to_anchor':(1,1),
'title': 'Percent Black',
'title_fontsize': 15,
'fontsize': 14,
'handles': [bl_1, bl_2, bl_3, bl_4, bl_5]
})
cx.add_basemap(ax_bl, source=cx.providers.CartoDB.Positron)
plt.title(label="Surrounding tracts by percentage Black population",
fontsize=16,
color="black");
/opt/conda/lib/python3.10/site-packages/geopandas/plotting.py:944: UserWarning: You have mixed positional and keyword arguments, some input may be discarded. ax.legend(patches, categories, **legend_kwds)
According to a study on transit ridership done by APTA, the proportions of Black and Asian people that ride public transportation is disproportionately higher compared to their makeup in the US population. These are likely transit-dependent riders that use transit for a variety of trip purposes, including commuting to work. Thus, they would likely benefit from more timely information on their trips.
joined_census_data['PCT_Asian'] = joined_census_data['Asian Alone']/joined_census_data['Total Population']*100
bins_as = [20, 35, 50, 65, 85]
as_1 = mpatches.Patch(color='#fef0d9', label='0 to 20')
as_2 = mpatches.Patch(color='#fdcc8a', label='20 to 35')
as_3 = mpatches.Patch(color='#fc8d59', label='35 to 50')
as_4 = mpatches.Patch(color='#e34a33', label='50 to 65')
as_5 = mpatches.Patch(color='#b30000', label='65 to 85')
ax_as = joined_census_data.plot(figsize=(12,10),
column='PCT_Asian',
legend=True,
cmap='OrRd',
scheme='UserDefined',
classification_kwds={'bins': bins_as},
linewidth=0.2,
edgecolor='grey',
alpha=0.8,
legend_kwds={
'loc': 'upper right',
'bbox_to_anchor':(1,1),
'title': 'Percent Asian',
'title_fontsize': 15,
'fontsize': 14,
'handles': [as_1, as_2, as_3, as_4, as_5]
})
cx.add_basemap(ax_as, source=cx.providers.CartoDB.Positron)
plt.title(label="Surrounding tracts by percentage Asian population",
fontsize=16,
color="black");
/opt/conda/lib/python3.10/site-packages/geopandas/plotting.py:944: UserWarning: You have mixed positional and keyword arguments, some input may be discarded. ax.legend(patches, categories, **legend_kwds)
The Bay Area, in specific, also has a large Asian population, with 27 percent of residents identifying as Asian American or Pacific Islander (AAPI).
We considered low-income households as those with incomes less than than $40,000.
joined_census_data['Low_Income'] = joined_census_data['Less than $10,000'] + joined_census_data['$10,000 to $14,999'] +joined_census_data['$15,000 to $19,999'] + joined_census_data['$20,000 to $24,999'] + joined_census_data['$25,000 to 29,999'] + joined_census_data['$30,000 to $34,999'] + joined_census_data['$35,000 to $39,999']
joined_census_data['PCT_Low_Income'] = joined_census_data['Low_Income']/joined_census_data['Households']*100
bins_li = [10, 25, 40, 50, 65]
li_1 = mpatches.Patch(color='#fef0d9', label='0 to 10')
li_2 = mpatches.Patch(color='#fdcc8a', label='10 to 25')
li_3 = mpatches.Patch(color='#fc8d59', label='25 to 40')
li_4 = mpatches.Patch(color='#e34a33', label='40 to 50')
li_5 = mpatches.Patch(color='#b30000', label='50 to 65')
ax_li = joined_census_data.plot(figsize=(12,10),
column='PCT_Low_Income',
legend=True,
cmap='OrRd',
scheme='UserDefined',
classification_kwds={'bins': bins_li},
linewidth=0.2,
edgecolor='grey',
alpha=0.8,
legend_kwds={
'loc': 'upper right',
'bbox_to_anchor':(1,1),
'title': 'Percent low income',
'title_fontsize': 15,
'fontsize': 14,
'handles': [li_1, li_2, li_3, li_4, li_5]
})
cx.add_basemap(ax_li, source=cx.providers.CartoDB.Positron)
plt.title(label="Surrounding tracts by percentage low-income households (under $40,000)",
fontsize=16,
color="black");
/opt/conda/lib/python3.10/site-packages/geopandas/plotting.py:944: UserWarning: You have mixed positional and keyword arguments, some input may be discarded. ax.legend(patches, categories, **legend_kwds)
Low-income households may be less likely to have access to a smartphone or a data plan and may, therefore, rely on static schedules for bus arrival times. Installing real-time information displays in census tracts with high percentages of low-income households could be especially beneficial for these residents as they may have no other means of accessing live bus arrival times.
joined_census_data['PCT_Take_Transit'] = joined_census_data['Public Transportation (Includes Taxicab)']/joined_census_data['Workers 16 years and over']*100
bins_tr = [20, 30, 45, 60, 70]
tr_1 = mpatches.Patch(color='#fef0d9', label='0 to 20')
tr_2 = mpatches.Patch(color='#fdcc8a', label='20 to 30')
tr_3 = mpatches.Patch(color='#fc8d59', label='30 to 45')
tr_4 = mpatches.Patch(color='#e34a33', label='45 to 60')
tr_5 = mpatches.Patch(color='#b30000', label='60 to 70')
ax_tr = joined_census_data.plot(figsize=(12,10),
column='PCT_Take_Transit',
legend=True,
cmap='OrRd',
scheme='UserDefined',
classification_kwds={'bins': bins_tr},
linewidth=0.2,
edgecolor='grey',
alpha=0.8,
legend_kwds={
'loc': 'upper right',
'bbox_to_anchor':(1,1),
'title': 'Percent taking transit',
'title_fontsize': 15,
'fontsize': 14,
'handles': [tr_1, tr_2, tr_3, tr_4, tr_5]
})
cx.add_basemap(ax_tr, source=cx.providers.CartoDB.Positron)
plt.title(label="Surrounding tracts by percentage of workers who take public transport",
fontsize=16,
color="black");
/opt/conda/lib/python3.10/site-packages/geopandas/plotting.py:944: UserWarning: You have mixed positional and keyword arguments, some input may be discarded. ax.legend(patches, categories, **legend_kwds)
From looking at these maps, the tracts with more low income households and greater transit ridership do seem to be aligned with those tracts that have more people of color.
We were then interested in taking a deeper dive into understanding the travel choices of those within census tracts that had the highest number of public transportation users compared to all other tracts. We started out with a plot of the number of public transportation users in the surrounding tracts with the most number of residents using public transportation. Then, we expanded upon our analysis in a later iteration by looking at the mode share of residents within these tracts that had the highest transit ridership. We find that the tracts that have the highest transit ridership also seem to have higher counts of walking and biking. We recognize that taking transit often involves active transportation for first and last mile access, and we believed that these travelers could benefit from greater reliability to minimize their wait time at the stop. Thus, we also used data provided to us by AC Transit to plot the monthly on-time performance of the three lines in 2022, defined by AC Transit as the percentage of trips that departed up to five minutes late.
Given that there seemed to be some tracts with significantly high transit ridership, we were particularly interested in the mode share of households in the area, and so we determined the five tracts that had the most transit ridership along each bus line. In this file, we show these for the 51A line. We first load the Census data and the dataset containing the geometries and join the two together.
census_data_51a = pd.read_csv("data/51a-tract-data/tracts_data_51a.csv")
geom_data_51a = pd.read_csv("data/51a-tract-data/tracts_51A.csv")
geom_data_51a = geom_data_51a.rename(columns={"GEOID": "FIPS"})
geom_data_51a = geom_data_51a.loc[:, ["FIPS", "geometry"]]
data_51a = census_data_51a.merge(right=geom_data_51a, on="FIPS")
We then make this dataframe a geodataframe.
data_51a["geometry"] = gpd.GeoSeries.from_wkt(data_51a["geometry"])
data_51a = gpd.GeoDataFrame(data_51a, geometry="geometry")
We updated the headers of these columns as they were very long.
modes = ["Drove Alone", "Carpooled", "Public Transportation (Includes Taxicab)", "Motorcycle", "Bicycle", "Walked", "Other Means"]
for i in range(7):
data_51a = data_51a.rename(columns={"Workers 16 Years and Over: " + modes[i]: modes[i]})
We first determine these top five tracts.
top_5_transit_tracts = data_51a.sort_values(by="Public Transportation (Includes Taxicab)", ascending = False).head(5)
We first plot the count of residents using public transit using pandas's bar.h function.
top_5_transit_tracts.plot.barh(x="FIPS",
y="Public Transportation (Includes Taxicab)",
xlabel="Count",
title="Top surrounding tracts with most residents using public transit",
legend=False);
As we learned about the plotly library the following week, we improved upon the chart by making an interactive bar chart that also showed the overall mode share of these tracts, with the interactivity allowing a viewer to see the specific counts for each mode. We converted our dataframe to be in a "long" dataframe format to work with plotly's bar function.
top_5_transit_tracts['Area Name'] = top_5_transit_tracts['Area Name'].str.replace("Census Tract ", "")
top_5_transit_tracts_long = pd.melt(top_5_transit_tracts, id_vars='Area Name', value_vars=modes)
fig_transit_tracts = px.bar(top_5_transit_tracts_long, x="Area Name", y="value", color="variable",
labels={
"value": "Count",
"variable": "Transportation mode",
},
title="Mode share in tracts with highest transit ridership")
fig_transit_tracts.update_layout(xaxis_title="Census Tract", legend=dict(font=dict(size=15)))
fig_transit_tracts.show()
We find that the tracts that have the highest transit ridership also seem to have higher counts of walking and biking. We recognize that taking transit often involves active transportation for first and last mile access, .
We reached out to AC Transit for data on delays and performance. They shared with us data on on-time performance for these three lines, which we have ploted below.
otp_19 = pd.read_csv('data/on-time performance/On Time Performance_Line 19 (2022).csv')
otp_20 = pd.read_csv('data/on-time performance/On Time Performance_Line 20 (2022).csv')
otp_51A = pd.read_csv('data/on-time performance/On Time Performance_Line 51A (2022).csv')
def plot_otp(df, i, line):
df.iloc[:,1] = df.iloc[:,1]*100
df.iloc[:,2] = df.iloc[:,2]*100
f = df.plot(ax=axes_otp[i], rot=90, legend=False, figsize=(11,5))
f.set_ylim(ymin=0, ymax=100)
if i == 0:
f.set_ylabel("Percent")
else:
f.set_yticks([])
f.set_xticks(np.linspace(0,11,12), labels=["Jan", "Feb", "Mar", "Apr", "May", "June", "July", "Aug", "Sep", "Oct", "Nov", "Dec"])
f.set_title("Line " + line)
if i == 2:
line_blue = Line2D([0], [0], label='Towards Uptown Oakland', color='#1f77b4')
line_orange = Line2D([0], [0], label='Towards East Oakland', color='#ff7f0e')
f.legend(loc = 'lower right', handles=[line_blue, line_orange])
fig_otp, axes_otp = plt.subplots(nrows=1, ncols=3)
plot_otp(otp_19, 0, "19")
plot_otp(otp_20, 1, "20")
plot_otp(otp_51A, 2, "51A")
fig_otp.suptitle("AC Transit On-Time Performance");
Finally, we decided to map out the bus stops within each of the three lines with average delays that exceeded AC Transit’s on-time performance threshold of five minutes. We hoped to create these maps to help us decide which stops within the most unreliable line would benefit from real-time information displays.
We collected real-time transit data via Open 511 SIRI APIs. This data is provided in an XML format and has information such as bus expected and aimed arrival and departure times. To determine which stops have the greatest average delays, we are gathering this real-time delay data during different times of day (peak and off-peak, AM and PM) throughout the week. The following interactive maps show bus stops with average delays over five minutes, during Sunday afternoons.
stops_shp_file = gpd.read_file('data/bus-stops/UniqueStops_Fall22.shp')
stops_shp_file.to_file('data/bus-stops/UniqueStops_Fall22.shp', driver='GeoJSON')
We had shapefiles with data on all bus stops from AC Transit's Data and Resource Center, and we had previously filtered these by line for the three bus lines we consider.
delays_19 = pd.read_csv('data/real-time delays/Line 19.csv')
delays_20 = pd.read_csv('data/real-time delays/Line 20.csv')
delays_51A = pd.read_csv('data/real-time delays/Line 51A.csv')
We join these bus stops associated with each line to the shapefile that has a geometry column for us to plot.
joined_geom_data_19 = delays_19.merge(stops_shp_file, left_on='stop_code', right_on='stp_511_id', how = 'left')
joined_geom_data_20 = delays_20.merge(stops_shp_file, left_on='stop_code', right_on='stp_511_id', how = 'left')
joined_geom_data_51A = delays_51A.merge(stops_shp_file, left_on='stop_code', right_on='stp_511_id', how = 'left')
The following function takes in a geodataframe and filters it so that the remaining rows are only those stops that have average delays of greater than five minutes.
def filter_for_delays(geom_data):
geom_data = geom_data[geom_data.iloc[:,5] > 300]
geom_data.columns.values[5] = "Average Delay"
geom_data['lon'] = geom_data.geometry.apply(lambda p: p.x)
geom_data['lat'] = geom_data.geometry.apply(lambda p: p.y)
return geom_data
The following function takes in a geodataframe and plots the filtered stops.
def generate_map(geom_data, line):
filtered_geom_data = filter_for_delays(geom_data)
fig = px.scatter_mapbox(filtered_geom_data,
lat='lat',
lon='lon',
zoom=12,
hover_name='stop_name',
hover_data=["Average Delay"],
mapbox_style="carto-positron")
# options on the layout
fig.update_layout(
width = 800,
height = 800,
title_x=0.5,
title='Stops with average delays over 5 minutes for AC Transit Line ' + line
)
return fig
map_51A = generate_map(joined_geom_data_51A, "51A")
map_51A
/tmp/ipykernel_9244/2253539602.py:4: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy /tmp/ipykernel_9244/2253539602.py:5: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
As we can see from the maps above, line 20 has the most delays. Line 19 is not even shown as it has no significant average delay. This is interesting as the 20 actually does not run as frequently as the 51A or as infrequently as the 19. However, this might make it more burdensome for riders waiting for the 20 at bus stops, as riders cannot expect the regularity of the 51A and may be waiting at the stop for longer periods of time than expected. We found it impressive that the 19 had no significant delay during our analysis period, but this suggests that the stops along that line may not particularly benefit from real-time information.
In this exercise, we created several isochrone maps to show the walkshed of bus stops along routes 51A, 19, and 20 that experience heavy passenger use and/or excessive delays (greater than 5 minutes). We used summer weekday average boarding/alighting data for this segment, and the data was collected in 2022.
First we created the code to produce the isochrone map. We specified the latlon, the mode of transport (i.e. walk), the time intervals and the speed (and hence distance) at which the pedestrians will walk. We also found the center node by calculating the centroid and bounding coordinates. Then, we created a list of time/colors and reversed the order to make sure the longest walk times are at the edge. Next, we specified the polygons for each time area and reversed the colors so the outer polygon gets drawn first. Finally, we created the subplots, specified the isochrone boundaries, built a loop of the time and corresponding color, and added the center node legend and basemap.
This bus stop sees around an average of 19 boardings on a summer weekday and experiences delays of 5 minutes or more.
latlon = [37.777649171356565, -122.27653651924717]
network_type = 'walk'
trip_times = [10,15,20]
meters_per_minute = 75
cmap = 'Oranges'
title = 'Distance one can walk from bus stop (Webster Street & Buena Vista Avenue)'
G = ox.graph_from_point(latlon, network_type=network_type, dist = 2000)
G = ox.project_graph(G, to_crs='epsg:3857')
gdf_nodes, gdf_edges = ox.graph_to_gdfs(G)
minx, miny, maxx, maxy = gdf_nodes.geometry.total_bounds
centroid_x = (maxx-minx)/2 + minx
centroid_y = (maxy-miny)/2 + miny
center_node = ox.distance.nearest_nodes(G,Y=centroid_y,X=centroid_x)
gdf_edges['walk_time'] = gdf_edges['length']/meters_per_minute
iso_colors = ox.plot.get_colors(n=len(trip_times),
cmap=cmap,
start=0,
return_hex=True)
time_color = list(zip(trip_times, iso_colors))
time_color.reverse()
for time, color in list(time_color):
subgraph = nx.ego_graph(G, center_node, radius=time)
for node in subgraph.nodes():
gdf_nodes.loc[node,'time'] = time
gdf_nodes.loc[node,'color'] = color
gdf_nodes['color'].fillna('#cccccc', inplace=True)
isochrones = gdf_nodes.dissolve(by = "time")
isochrones = isochrones.convex_hull.reset_index(name='geometry')
isochrones.sort_values(by='time', ascending=False,inplace=True)
iso_colors.reverse()
fig, ax = plt.subplots(figsize=(10,15))
isochrones.boundary.plot(
ax=ax,
alpha=1,
linestyle='--',
color=iso_colors,
lw=2
)
isochrones.plot(
ax=ax,
alpha=0.2,
categorical=True,
color=iso_colors,
)
gdf_nodes.loc[[center_node]].plot(
ax=ax,
color='r',
marker='x',
markersize=50
)
legend_elements = [
Line2D([0], [0], marker='x', color='red', linestyle='',label='Walkshed from Bus Stop', markersize=6),
]
time_color.reverse()
for time,color in list(time_color):
legend_item = Patch(facecolor=color, edgecolor=color, linestyle='--',linewidth=1,label=str(time)+' minutes',alpha=0.4)
legend_elements.append(legend_item)
ax.legend(handles=legend_elements,loc='lower left')
ax.set_title(title,fontsize=15,pad=10)
ax.axis('off')
cx.add_basemap(ax,source=cx.providers.CartoDB.Positron)
Next we created a function to avoid having to repeat the code over and over again. We set the latlon, cmap and title to the argument def isomap so that all we needed to do was to specify these querries in the next round.
def isomap(latlon=[37.777649171356565, -122.27653651924717],cmap='Oranges', title='Distance one can walk from bus stop (Webster Street & Buena Vista Avenue)'):
latlon = latlon
network_type = 'walk'
trip_times = [10,15,20]
meters_per_minute = 75
cmap = cmap
title = title
G = ox.graph_from_point(latlon, network_type=network_type, dist = 2000)
G = ox.project_graph(G, to_crs='epsg:3857')
gdf_nodes, gdf_edges = ox.graph_to_gdfs(G)
minx, miny, maxx, maxy = gdf_nodes.geometry.total_bounds
centroid_x = (maxx-minx)/2 + minx
centroid_y = (maxy-miny)/2 + miny
center_node = ox.distance.nearest_nodes(G,Y=centroid_y,X=centroid_x)
gdf_edges['walk_time'] = gdf_edges['length']/meters_per_minute
iso_colors = ox.plot.get_colors(n=len(trip_times),
cmap=cmap,
start=0,
return_hex=True)
time_color = list(zip(trip_times, iso_colors))
time_color.reverse()
for time, color in list(time_color):
subgraph = nx.ego_graph(G, center_node, radius=time)
for node in subgraph.nodes():
gdf_nodes.loc[node,'time'] = time
gdf_nodes.loc[node,'color'] = color
gdf_nodes['color'].fillna('#cccccc', inplace=True)
isochrones = gdf_nodes.dissolve(by = "time")
isochrones = isochrones.convex_hull.reset_index(name='geometry')
isochrones.sort_values(by='time', ascending=False,inplace=True)
iso_colors.reverse()
fig, ax = plt.subplots(figsize=(10,15))
isochrones.boundary.plot(
ax=ax,
alpha=1,
linestyle='--',
color=iso_colors,
lw=2
)
isochrones.plot(
ax=ax,
alpha=0.2,
categorical=True,
color=iso_colors,
)
gdf_nodes.loc[[center_node]].plot(
ax=ax,
color='r',
marker='x',
markersize=50
)
legend_elements = [
Line2D([0], [0], marker='x', color='red', linestyle='',label='Walkshed from Bus Stop', markersize=6),
]
time_color.reverse()
for time,color in list(time_color):
legend_item = Patch(facecolor=color, edgecolor=color, linestyle='--',linewidth=1,label=str(time)+' minutes',alpha=0.4)
legend_elements.append(legend_item)
ax.legend(handles=legend_elements,loc='lower left')
ax.set_title(title,fontsize=15,pad=10)
ax.axis('off')
cx.add_basemap(ax,source=cx.providers.CartoDB.Positron)
Now we can create maps from just inputting the relevant lat/lons, cmap, and title.
This bus stop sees an average of 37 daily boardings on a summer weekday and experiences delays of 5 minutes or more.
isomap(latlon=[37.76525044249728, -122.24212123646979], cmap='Oranges', title= 'Distance one can walk from bus stop (Park Street & Santa Clara Ave)')
This bus stop sees the heaviest use of an average of 55 passenger boardings per day during summer weekdays and experiences delays of 5 minutes or more.
isomap(latlon=[37.75872168546182, -122.25276060928036], cmap='Oranges', title= 'Distance one can walk from bus stop (Alameda South Shore Center)')
While this bus stop, Broadway and 12th, is not necessarily delay-prone, it sees a whopping average of 112 passenger boardings per summer weekday.
isomap(latlon=[37.809371094799374, -122.26794248548805], cmap='Blues', title= 'Distance one can walk from bus stop (Broadway & 12th)')
These last two stops experience delays and see around an average of 35 passenger boardings per summer weekday.
isomap(latlon=[37.818851182970235, -122.26202176824387], cmap='Blues', title= 'Distance one can walk from bus stop (Broadway & 30th)')
isomap(latlon=[37.82492760092457, -122.2581806241156], cmap='Blues', title= 'Distance one can walk from bus stop (Broadway & West MacArthur Blvd)')
We will be looking at AC Transit’s bus stop shelter guide to understand criteria that the agency uses to make determinations about where to install bus shelters. We believe that these stops could potentially be good candidates for installing real time information displays as they may already have the necessary infrastructure in place, such as electrical wiring and sufficient sidewalk space. However, we hope to reach our own conclusions from our analysis that may confirm or refute AC Transit’s recommendations. We also will be performing additional analysis using ridership data obtained from AC Transit to further support our conclusions of which stops could benefit from the installation of real-time information displays. Finally, we will be collecting more real-time data from the 511 API for a variety of times of day and days of the week to better understand patterns of delays.
In this file, the visuals were made by:
All three members collected and used Census data for this assigment as well as previous assignments. Jackson additionally worked with on-time performance data provided by AC Transit and the average bus delays that Purva calculated. Purva collected real-time data by calling the 511 API, parsing through the XML response, and calculating the average delays.
Hamzah updated the project proposal and wrote the "Background" section above. Jackson wrote the section headings/descriptions for the three parts in our analysis, and updated this for Group Assignment 3 by providing greater context and analysis on our visualizations. Purva compiled and documented all the code in this file.